Enhancing Map-Reduce Framework for Bigdata with Hierarchical Clustering
نویسنده
چکیده
MapReduce is a software framework that allows certain kinds of parallelizable or distributable problems involving large data sets to be solved using computing clusters. This paper introduces our experience of grouping internet users by mining a huge volume of web access log of up to 500 gigabytes. The application is realized using hierarchical clustering algorithms with Map-Reduce, a parallel processing framework over clusters. However, the immediate implementation of the algorithms suffers from efficiency problem for both inadequate memory and higher execution time. This paper presents an efficient hierarchical clustering method of mining large datasets with Map-Reduce. The method includes two optimization techniques: Batch Updating to reduce the computational time and communication costs among cluster nodes, and Co-occurrence based feature selection to decrease the dimension of feature vectors and eliminate noise features. KEYWORDS-Hierarchical clustering, Batch Updating, Feature selection, MapReduce, Bigdata
منابع مشابه
A New Agglomerative Hierarchical Clustering Algorithm Implementation based on the Map Reduce Framework
Text clustering is one of the difficult and hot research fields in the text mining research. Combing Map Reduce framework and the neuron initialization method of VPSOM (vector pressing SelfOrganizing Model) algorithm, a new text clustering algorithm is presented. It divides the large text vector dataset into data blocks, each of which then processed in different distributed data node of Map Red...
متن کاملApplication of 3D-QSAR on a Series of Potent P38-MAP Kinase Inhibitors
One of the most applied methods in drug industry for development of new drugs is 3D-QSAR methodology. As p38-mitogen-activated protein kinase (p38-MAPK) plays a crucial role in regulating the production of such proinflammatory cytokines as tumor necrosis factor-α (TNF-α) and interleukin-1, emerging as an attractive target for new anti-inflammatory agents, we used a 3D-QSAR based method of Compa...
متن کاملHierarchical Analysis of Effective Components on Auditors\' Professional Liability Insurance to Develop Ethical Framework
Background: The development of the insurance industry and the accounting system are two important pillars of economic development of any country and the ethical responsibility of auditors requires them to reduce the risk of their activities. The present study has developed a hierarchical analysis of the components affecting the professional liability insurance of auditors in order to develop th...
متن کاملA Study of Data Management Technology for Handling Big Data
The amount of data is increasing daily. Data requires storage and effective processing for information retrieval. These both are challenge in case of the BigData due its velocity, variety and volume. It requires different management and efficient information retrieval schemes. There are different techniques available for the management of the Bigdata. The distribution of the storage and the pro...
متن کاملGraph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کامل